Form Design for High Accuracy Optical Character Recognition

نویسندگان

  • Michael D. Garris
  • Darrin L. Dimmick
چکیده

Form Design for High Accuracy Optical Character Recognition Michael D. Garris, [email protected] Darrin L. Dimmick, [email protected] National Institute of Standards and Technology, Building 225, Room A216 Gaithersburg, Maryland 20899 Phone: (301)975-2928, FAX: (301)840-1357 Published in IEEE Transactions PAMI, June 1996. ABSTRACT Financial institutions, insurance companies, and government agencies are all aggressively pursuing the integration of automated forms processing into their everyday work flows. To use existing optical character recognition (OCR) technology, the forms that are currently hand-keyed will probably need to be redesigned. This paper presents some of the quantitative results generated by a comprehensive study of three versions of a redesigned tax form. Analyses show that using separately spaced bounding character boxes to represent fields provides superior machine readability over fields without character boxes, fields containing vertical ticks (combs), and fields with adjoining character boxes. It is also shown that character boxes containing two vertically stacked ovals cause writers much more difficulty to complete than do empty character boxes. The analyses also provide quantitative proof that writer idiosyncratic responses on forms are the major source of errors within the recognition system. These idiosyncracies (such as writers crossing out previously printed characters or writing over them) must be effectively handled in order improve recognition performance. This paper demonstrates how form design can help, and it provides empirical data to support some of the rules-of-thumb by measuring the impact specific changes to a form have on machine readability and on the writer.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High accuracy handwritten Chinese character recognition using LDA-based compound distances

Article history: Received 9 December 2007 Received in revised form 11 April 2008 Accepted 15 April 2008

متن کامل

A Robust Free Size OCR for Omni-Font Persian/Arabic Printed Document Using Combined MLP/SVM

Optical character recognition of cursive scripts present a number of challenging problems in both segmentation and recognition processes and this attracts many researches in the field of machine learning. This paper presents a novel approach based on a combination of MLP and SVM to design a trainable OCR for Persian/Arabic cursive documents. The implementation results on a comprehensive databas...

متن کامل

Important New Developments in Arabographic Optical Character Recognition (OCR)

Leipzig University’s (LU) Alexander von Humboldt Chair for Digital Humanities—has achieved Optical Character Recognition (OCR) accuracy rates for classical Arabic-script texts in the high nineties. These numbers are based on our tests of seven different Arabic-script texts of varying quality and typefaces, totaling over 7,000 lines (~400 pages, 87,000 words; see ​Table 1​ for full details). The...

متن کامل

Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten

Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...

متن کامل

Design of an Optical Character Recognition System for Camera-based Handheld Devices

This paper presents a complete Optical Character Recognition (OCR) system for camera captured image/graphics embedded textual documents for handheld devices. At first, text regions are extracted and skew corrected. Then, these regions are binarized and segmented into lines and characters. Characters are passed into the recognition module. Experimenting with a set of 100 business card images, ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Pattern Anal. Mach. Intell.

دوره 18  شماره 

صفحات  -

تاریخ انتشار 1996